View Post [edit]
Poster: | hank_b | Date: | Mar 9, 2010 11:49pm |
Forum: | texts | Subject: | Re: PDF's with text layer |
Having significant numbers of PDFs with text layers uploaded is a relatively new thing - our processing pathways were all originally designed to work from images, not PDFs. We do now derive from contributed PDFs, but only through the makeshift approach of extracting the images from the PDF and processing them as though they came from scanning a book. That means the OCR results you see (via the "Full Text" link) come from running OCR on your images, so yes, it has the same (limited) level of accuracy as the OCR we produce for scanned books.
Extracting text directly from contributed PDFs is on the to-do list, but I'm afraid I can't give you any estimate of when it will happen.
Hank Bromley
software engineer
Internet Archive
Reply [edit]
Poster: | genet | Date: | Jan 23, 2011 12:14pm |
Forum: | texts | Subject: | Re: PDF's with text layer |
Thanks,
-Gene
Reply [edit]
Poster: | hank_b | Date: | Jan 23, 2011 12:32pm |
Forum: | texts | Subject: | Re: PDF's with text layer |
Unfortunately, no, nothing new to report on using the text layer found within contributed PDFs.
-- Hank
Reply [edit]
Poster: | virtualverse0 | Date: | Apr 19, 2017 7:03am |
Forum: | texts | Subject: | Re: PDF's with text layer |
This post was modified by virtualverse on 2017-04-19 14:03:06
Reply [edit]
Poster: | genet | Date: | Jan 23, 2011 12:36pm |
Forum: | texts | Subject: | Re: PDF's with text layer |
-Gene